Aki Shiroshita (Epidemiology PhD student, ) developed a tailored version of DeGAUSS specifically for the EV project.

About Original DeGAUSS

DeGAUSS (https://degauss.org/) is designed to derive environmental variables while preserving the privacy of protected health information (PHI). It uses Docker images to process address data, Users upload a CSV file containing address information and receive an output file with various environmental variables.

Limitations of Original DeGAUSS

Original DeGAUSS may not be so flexible.

Improvements in the Modified DeGAUSS

Modified DeGAUSS provides clean, processed output files with all PHI removed.

What can we get through Modified DeGAUSS?

Category Variable name Data source Description
Parsing and normalizing address libpostal cleaned address
Geocoding lon TIGER/Line Street Range Address longitude and latitude
lat TIGER/Line Street Range Address longitude and latitude
Road proximity dist_to_1100 U.S. Census Bureau distance (meters) to the nearest S1100 road
dist_to_1200 U.S. Census Bureau distance (meters) to the nearest S1200 road
length_1100 U.S. Census Bureau length (meters) of S1100 roads within a 400 m buffer
length_1200 U.S. Census Bureau length (meters) of S1200 roads within a 400 m buffer
Traffic density length_moving U.S. Department of Transportation Federal Highway Administration total length of interstates, expressways, and freeways (meters)
length_stop_go U.S. Department of Transportation Federal Highway Administration total length of arterial roads (meters)
vehicle_meters_moving U.S. Department of Transportation Federal Highway Administration average daily number of vehicles multiplied by the length of interstates, expressways, and freeways (vehicle-meters)
vehicle_meters_stop_go U.S. Department of Transportation Federal Highway Administration average daily number of vehicles multiplied by the length of arterial roads (vehicle-meters)
truck_meters_moving U.S. Department of Transportation Federal Highway Administration average daily number of trucks multiplied by the length of interstates, expressways, and freeways (truck-meters)
truck_meters_stop_go U.S. Department of Transportation Federal Highway Administration average daily number of trucks multiplied by the length of arterial roads (truck-meters)
New road proximity and traffic density dist_near U.S. Department of Transportation Federal Highway Administration distance (meters) to the nearest interstates, expressways, or freeways
aadt_near U.S. Department of Transportation Federal Highway Administration average daily number of vehicles of the nearest interstates, expressways, or freeways
Redlining categories redlining Mapping Inequality Historic HOLC classifications (A, B, C, and D)
Greenspace evi_500 LP DAAC MOD13Q1 average enhanced vegetation index within a 500 meter buffer radius
evi_1500 LP DAAC MOD13Q1 average enhanced vegetation index within a 1500 meter buffer radius
evi_2500 LP DAAC MOD13Q1 average enhanced vegetation index within a 2500 meter buffer radius
Deprivation score fraction_assisted_incom 2018 American Community Survey fraction of households receiving public assistance income or food stamps or SNAP in the past 12 months
fraction_high_school_edu 2018 American Community Survey fraction of population 25 and older with educational attainment of at least high school graduation (includes GED equivalency)
median_income 2018 American Community Survey median household income in the past 12 months in 2018 inflation-adjusted dollars
fraction_no_health_ins 2018 American Community Survey fraction of population with no health insurance coverage
fraction_poverty 2018 American Community Survey fraction of population with income in past 12 months below poverty level
fraction_vacant_housing 2018 American Community Survey fraction of houses that are vacant
dep_index 2018 American Community Survey composite measure of the 6 variables above
Air pollutants average_no2_infancy Original Schwartz model Average daily NO2 levels during infancy
average_bc_infancy Provided by Kai Zhang Average monthly black carbon levels during infancy

In addition,

How to Use Modified DeGAUSS

The environment has already been set up for you. All you need to do is follow the instructions.

Step-by-Step Instructions

  1. Locate the Folder:

Navigate to the folder “C:_degauss_2025_08_14” on the Windows server (Cqshealth.dhcp.mc.vanderbilt.edu).

  1. Open R Project:

Launch R Studio.

Note: It may take 1–2 minutes to open, as the R Studio settings have been customized for this project. Please wait each time you run the program until items appear in the environment.

  1. Start Podman:

Open the Command Prompt and run podman machine start.

  1. Run R Script:

Open the file test.R.

Execute the script section by section using the shortcut:

Place your cursor in the section and press Ctrl + Alt + T.

  1. Locate Output Files:

Processed data will be saved in any folder of your choice.

This folder contains CSV files, including: tract.csv (used for subject selection flow), final_data.csv (the final dataset for sharing with other researchers, with all PHI removed), tab_census.csv (census tract tabulation data), and tab_relocation.csv (relocation information).”

Specific instructions for Huiping

Note: Your data will remain on the shared drive and will never leave the VUMC environment.The server will load data into memory for processing, but data will not be stored in local server folders. Any temporary cache generated during processing will be automatically removed.

Could you provide the path to the input folder containing the address data and the file name?

What is the path to the output folder where you’d like to store the processed data after removing all PHI data?

If you would like to create a temporary folder in a different location to store intermediate files containing PHI, please specify the path.

Defining start date and end date

For defining start date and end data, we need merge any overlapping or adjacent enrollment periods into single, continuous time spans. This ensures there are no gaps in the timeline.

TennCare enrollment file is like this:

recip enrol_begin_date enrol_end_date address
1 2023-01-01 2024-01-02 123 Main St
1 2024-01-02 2025-03-02 456 Elm St
2 2022-01-02 2023-o1-02 789 Oak St

not like this:

recip registration_date address
1 2023-01-01 123 Main St
1 2024-01-02 456 Elm St
2 2022-01-02 789 Oak St

Delete modified DeGAUSS

Once all processes are completed and the required outputs are finalized, I will delete the modified DeGAUSS from the server.